25 research outputs found

    Data Mining for Software Engineering

    Get PDF

    Harvey: A Greybox Fuzzer for Smart Contracts

    Full text link
    We present Harvey, an industrial greybox fuzzer for smart contracts, which are programs managing accounts on a blockchain. Greybox fuzzing is a lightweight test-generation approach that effectively detects bugs and security vulnerabilities. However, greybox fuzzers randomly mutate program inputs to exercise new paths; this makes it challenging to cover code that is guarded by narrow checks, which are satisfied by no more than a few input values. Moreover, most real-world smart contracts transition through many different states during their lifetime, e.g., for every bid in an auction. To explore these states and thereby detect deep vulnerabilities, a greybox fuzzer would need to generate sequences of contract transactions, e.g., by creating bids from multiple users, while at the same time keeping the search space and test suite tractable. In this experience paper, we explain how Harvey alleviates both challenges with two key fuzzing techniques and distill the main lessons learned. First, Harvey extends standard greybox fuzzing with a method for predicting new inputs that are more likely to cover new paths or reveal vulnerabilities in smart contracts. Second, it fuzzes transaction sequences in a targeted and demand-driven way. We have evaluated our approach on 27 real-world contracts. Our experiments show that the underlying techniques significantly increase Harvey's effectiveness in achieving high coverage and detecting vulnerabilities, in most cases orders-of-magnitude faster; they also reveal new insights about contract code.Comment: arXiv admin note: substantial text overlap with arXiv:1807.0787

    Improving Software Productivity and Quality via Mining Source Code

    No full text
    The major goal of software development is to deliver high-quality software efficiently. To achieve this goal of delivering high-quality software efficiently, programmers often reuse existing frameworks or libraries, hereby referred to as libraries, instead of developing similar code artifacts from the scratch. However, programmers often face challenges in reusing existing libraries due to two major factors. First, many existing libraries are not well-documented. Even when such documentations exist, they are often outdated. Second, many existing libraries expose a large number of application programming interfaces (APIs), which represent interfaces through which libraries expose their functionalities. For example, the .NET base library provides nearly 10,000 API classes. Due to these two preceding factors, there exist three major problems that affect both software productivity and quality. First, programmers often spend more time in reusing existing libraries, thereby reducing software productivity. Second, programmers introduce defects while using APIs due to lack of proper knowledge on how to reuse those APIs. Third, existing white-box test generation techniques face challenges in effectively generating test inputs for the client code that reuses libraries. To address these three preceding issues, in this dissertation, we propose a general framework, called WebMiner, that uses existing open source code available on the web by leveraging a code search engine. In particular, WebMiner infers usage specifications for API methods under analysis by automatically collecting relevant code examples from the open source code available on the web. WebMiner next applies data mining techniques on those collected code examples to identify common patterns, which represent likely usage of APIs, referred to as API usage specifications. The primary reason for identifying common patterns is based on the observation that majority of the programmers correctly adhere to API usage specifications and those common patterns are likely to represent the correct usage of APIs. We further propose six approaches based on our general framework, where each approach focuses on a specific software engineering (SE) task such as detecting defects in an application under analysis. In particular, the first two approaches assist programmers in effectively reusing APIs provided by existing libraries. The next two approaches use mined API usage specifications as programming rules and detect defects in applications under analysis as deviations from the mined specifications. Finally, the last two approaches mine static and dynamic traces, respectively, for effectively generating test inputs that achieve high structural coverage of the code under test. We also propose another approach that addresses a major issue with mining-based approaches, which are not effective in scenarios where usage information is not available for the API methods under analysis or usage information is not sufficient to achieve the SE task under analysis. Our empirical results show that the approaches developed based on our WebMiner framework effectively address the respective SE tasks handled by those approaches. In particular, our empirical results demonstrate the effectiveness of expanding the data scope of mining-based approaches to large open source code available on the web. Our results also show that our approaches address queries posted in developer forums and detect new defects that are not detected by existing related approaches, thereby improving both software productivity and quality

    Mining exception-handling rules as sequence association rules

    No full text
    Programming languages such as Java and C++ provide exception-handling constructs to handle exception conditions. Applications are expected to handle these exception conditions and take necessary recovery actions such as releasing opened database connections. However, exceptionhandling rules that describe these necessary recovery actions are often not available in practice. To address this issue, we develop a novel approach that mines exceptionhandling rules as sequence association rules of the form “(FC 1 c...FC n c) ∧ FCa ⇒ (FC 1 e...FC m e)”. This rule describes that function call FCa should be followed by a sequence of function calls (FC 1 e...FC m e) when FCa is preceded by a sequence of function calls (FC 1 c...FC n c). Such form of rules is required to characterize common exceptionhandling rules. We show the usefulness of these mined rules by applying them on five real-world applications (including 285 KLOC) to detect violations in our evaluation. Our empirical results show that our approach mines 294 real exception-handling rules in these five applications and also detects 160 defects, where 87 defects are new defects that are not found by a previous related approach.

    PARSEWeb: A programmer assistant for reusing open source code on the web

    No full text
    Programmers commonly reuse existing frameworks or libraries to reduce software development efforts. One common problem in reusing the existing frameworks or libraries is that the programmers know what type of object that they need, but do not know how to get that object with a specific method sequence. To help programmers to address this issue, we have developed an approach that takes queries of the form “Source object type → Destination object type ” as input, and suggests relevant method-invocation sequences that can serve as solutions that yield the destination object from the source object given in the query. Our approach interacts with a code search engine (CSE) to gather relevant code samples and performs static analysis over the gathered samples to extract required sequences. As code samples are collected on demand through CSE, our approach is not limited to queries of any specific set of frameworks or libraries. We have implemented our approach with a tool called PARSEWeb, and conducted four different evaluations to show that our approach is effective in addressing programmers’ queries. We also show that PARSEWeb performs better than existing related tools: Prospector and Strathcona

    Making Exceptions on Exception Handling

    No full text
    Abstract—The exception-handling mechanism has been widely adopted to deal with exception conditions that may arise during program executions. To produce high-quality programs, developers are expected to handle these exception conditions and take necessary recovery or resource-releasing actions. Failing to handle these exception conditions can lead to not only performance degradation, but also critical issues. Developers can write formal specifications to capture expected exceptionhandling behavior, and then apply tools to automatically analyze program code for detecting specification violations. However, in practice, developers rarely write formal specifications. To address this issue, mining techniques have been used to mine common exception-handling behavior out of program code. In this paper, we discuss challenges and achievements in precisely specifying and mining formal exception-handling specifications, as tackled by our previous work. Our key insight is that expected exception-handling behavior may be “conditional ” or may need to accommodate “exceptional ” cases. I
    corecore